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This paper presents the use of data-driven models, namely 
Gene expression programming (GEP), M5 model tree (M5- 
TREE), Multivariate adaptive regression spline (MARS), and 
Adaptive neuro-fuzzy inference system (ANFIS) to predict 
bridge pier scour depth. Only 213 data sets of the live bed 
conditions from laboratory tests and field data measurements 
were considered for the present analysis. The gamma test has 
been performed to determine the ideal input combinations for 
model development. Five main non-dimensional parameters: 
Sediment Coarseness ratios, Froude number, flow intensity, 
gradation coefficient of the bed material, and shape factor, 
were found to be the vital input parameters for scour depth 
model development. The results of these 4 data-driven 
models were compared with the results of nine conventional 
empirical equations using the performance criterion 
correlation coefficient (R), root mean squared error (RMSE), 
mean absolute percentage error (MAPE), Nash-Sutcliffe 
efficiency (E), and index of agreement (Ig) and_ graphical 
analysis. Based on values of the performance indices, ANFIS 
model was selected with R=0.986, RMSE=0.062, 
MAPE=6.767, E=0.975 and I,=0.987. The results also show 
the outperformance of ANFIS model over the other selected 
data driven models and conventional empirical equations. 
This model can also be applied to the modelling of bridge 
pier scour in clear water conditions and can provide insight 
into the efficacy of modelling approaches in hydraulic 
properties. 
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1. Introduction 


Bridge scour is a natural occurrence brought on by the erosive activity of water, which removes 
and excavates materials from the area around the piers of a bridge. The primary cause of bridge 
failure is scouring at the pier, which can result in huge financial losses and even human 
causalities [1]. Bridge scour is a dynamic process that fluctuates depending on variables like 
flow depth, angle and velocity of flow, pier size and shape, bed material gradation, etc [2-4]. 
Scouring around bridge piers includes complexity of three-dimensional flow and sediment 
transport processes [5]. Based on the mode of sediment transportation in the scour hole, scouring 
of bridge pier is studied under two different conditions, namely clear-water conditions and live 
bed condition [6]. In clear water conditions, no significant sedimentation is done in the scour 
hole by flowing water where as in the live-bed condition the upstream flow deposits significant 
amount of sediments in the scour hole [7]. Various researchers have conducted numerous studies 
to comprehend the flow mechanism and anticipate the depth of scour in bridge piers [8—13]. 
Johnson [14] compared seven bridge pier scour equations which were laboratory-based using a 
large dataset of field data, for both live-bed and clear water conditions. It was found that most of 
these equations have uncertainties and over-predict the scour depth when applied to practice. 
Very comprehensive laboratory and field data sets were used to evaluate 23 pier-scour equations 
[15]. They concluded that the results varied even for the same case due to variation in parameters 
involved in these equations. Under the influences of flow, bed materials, and pier, it is 
challenging to establish and develop mathematical models of the scour process [16]. Moreover, 
there are not enough acceptable models to anticipate the scour depth to account for all potential 
variations from the abovementioned techniques [17]. Researchers have developed most of these 
empirical equations using data from the field and laboratory, and they varied from one another in 
terms of the variables taken into account when developing the scour model, the parameters 
involved in the equation, the circumstance in the lab etc [18]. A precise estimate of the scour 
depth is necessary for designing the bridge foundation securely; underestimating it could result 
in bridge failure, while overestimating it would result in exorbitant construction expenses [19]. 
Thus, numerous researchers have been interested in investigating and developing techniques for 
enhancing conventional physical-based analysis due to their recognition of these challenges and 
the significance of enhancing prediction abilities. Recently, soft computing methods have offered 
reasonably impressive solutions for hydrological systems and hydraulic engineering challenges 
when there is a highly complicated and nonlinear relation between the input-output pairs in the 
associated data [17,20—22]. To estimate scour depth, data-driven models (ANN, ANFIS, GEP, 
SVM GA, GP, MARS, FFNN, PSO, and M5-TREE) are now applied widely. Azmathullah et al. 
[23] employed genetic programming (GP) for predict the scour depth in the bridge pier. They 
demonstrated that the GP model was better in prediction of scour depth than the regression 
equations and artificial neural network (ANN). Pal et al. [24] used field data to investigate the 
potential of M5-Tree in calculating the local scour around bridge piers. The outcome showed that 
MS-Tree outperformed traditional equations in terms of performance. Akib et al. [25] applied an 
adaptive neuro-fuzzy inference system (ANFIS) and classical linear regression (LR) in the 
prediction of scour depth in the bridge. They illustrated that by comparing ANFIS with LR, the 
former showed comparatively greater accuracy and precision. Sreedhara et al. [26] have tried to 
investigate the use ANFIS and particle swarm optimization tuned support vector machine (PSO- 
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SVM) in prediction of scour depth around various pier shapes using experimental data and found 
that PSO-SVM model is an effective and reliable strategy for estimating the scour depth of a pier. 
Majedi-Asl et al. [27] examined the support vector machine (SVM) algorithm's capacity to 
estimate bridge scour depth depending on the pier shape. They also demonstrated that SVM 
outperformed the nonlinear regression model and expression programming (GEP) in terms of 
performance. Roshni and Prakash [28] investigated the use of the feedforward neural network 
(FFNN) and multivariate adaptive regression spline (MARS) models to predict the depth of 
scour around a bridge pier. The outcomes of the soft computing models were compared with 
those of empirical models, revealed that soft computing models were superior to other empirical 
models. Hassan and Jalal [29] applied GEP to predict local scour depth at a bridge pier. The 
findings suggest that, compared to nonlinear regression (NLR) and conventional regression 
models, the GEP predicts the local scour depth better. Hassan et al. [30] investigated GEP and 
ANN based on the PyTroch approach to estimate local scour depth near the bridge pier. They 
concluded that the equation produced by the ANN-based PyTroch approach performs better than 
GEP and NLR in estimating the scour depth. Daneshfaraz et al. [31] have experimented on the 
effect of cables in the local scouring of bridge piers. They claimed that increasing cable diameter 
might decrease the starting and ending scouring depths. Additionally, they reported that the ANN 
and ANFIS algorithms have great capabilities for estimating scour depth. Although several data- 
driven models have been used to estimate scour depth, the mode of sediment transport 
conditions, the complexity of their hydraulic characteristics, and the sediment properties 
themselves highlighted the need for developing new models to tackle the related challenges 
based on their specific characteristics and unique circumstances. Khalid et al. [32] suggested that 
most bridge failures occur under live-bed conditions during floods but only a few studies have 
been conducted using data-driven methodologies to determine the scour depth in bridge piers in 
live-bed conditions. These facts motivated the current research using data-driven models to 
estimate the scour depth in a live-bed condition. Choosing an ideal combination of input 
variables enhances the accuracy of data-driven models [33]. As a result, this research has three 
folds (i) to identify the best set of input variables for predicting the scour depth around bridge 
piers in live-bed conditions (ii) to assess how well four data-driven models GEP.MS5-Tree, 
MARS, and ANFIS perform in estimating the depth of scour and the final step (iii), to compare 
the outcomes of all data driven models employed in this research with those of conventional 
empirical equations. Fig. | illustrated the flowchart of the methodology of present research work. 


2. Methodology and data collection 


This section explains the specifics of scour depth modeling utilizing data-driven models and 
conventional empirical models. 


2.1. Gamma test 


Scouring is exceptionally dynamic, nonlinear, and complex. The researcher have to follow 
tedious and laborious trial-and-error process for finding the optimal input combinations. The 
Gamma test (GT), which is a non-parametric test is used for the evaluations of best input 
variables, that are competent enough to build a reliable and smooth model for this problem. 
Agalbjorn et al. [34] described, the GT approach measures the base mean square error(MSE) 
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which affects the choices made regarding information input. The results of GT can be sorted by 
considering of another term, V-ratio, that provides a scale constant value evaluation within 0 and 
1. Because it is independent of the yield range, the V-ratio is a good number to consider when 
comparing yields or yields from different informational collections. A smooth model shows a 
high consistency of the particular yield when the V-ratio is close to zero. For present study, the 
GT have been performed using the winGamma software [35]. 


Fig.1. The overall methodology to predict the scour depth near bridge piers in live-bed conditions. 


2.2. GEP 


A search technique called GEP, or genetic programming extension [36], develops computer 
programs. Ferreira [37], Teodorescu and Sherwood [38] were the first to encode linear 
chromosomes before being represented or converted into expression trees (ETs). In contrast to 
GP, which combines genotype and Phenotype functions, GEP is a highly effective gene. There 
are five steps involved in the formulation of the GEP. Making a sample population group was the 


28 S. Shalini, Thendiyath Roshni/ Journal of Soft Computing in Civil Engineering 7-4 (2023) 24-49 


first step. Any population size at this point can be used, however a study by Ferreira [37] 
suggested that a population range of 30 to 100 produced the best results. Second step involves 
the determination of fitness function for an individual chromosome. In the third step, we select 
the terminal and set of functions for constructing chromosomes. The fourth step is choosing the 
chromosome architecture through head size and gene count. In the fifth step, we adjust genetic 
operators like mutation, inversion, transport of insertion sequences (IS), root insertion sequences 
(RIS), transport of genes, double or single crossover along with gene crossover to achieve the 
required accuracy. Fig.2 depicts the methodology used to model the scour using the GEP 
approach [39]. 


Fig.2. Flowchart of GEP modelling procedure [39]. 


2.3. M5-TREE 


The M5-Tree model, that combines a linear regression and a conventional decision tree which 
segregates a dataset into sub-datasets, was first introduced by Quinlan [40]. The nomenclature of 
regression trees with fixed values at their leaves is enhanced by this model [41]. In the next step, 
pruning, and then splitting has been done to build the M5-Tree model. The standard deviation 
(SD) values are utilized as the measure of error at the nodes when computing the expected 
reduction in error using the splitting criterion. When only a few instances are left or differ 
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slightly, we stop further splitting in M5-Tree. This process results in splitting the M5-Tree into a 
sizeable tree-like structure by replacing a sub-tree with a leaf in the next step. This pruning may 
sometimes lead to abrupt discontinuities, which must be removed by smoothing the pruned tree 
in the final step. Fig.3 depicts the methodology used to model the scour using the M5-Tree 
approach [42]. 


Fig.3. Flowchart of MS-Tree modelling procedure [42]. 
2.4. MARS 


The multivariate adaptive regression splines (MARS) starting accesses the nonlinear relationship 
between a collection of input and dependent variables in high-dimensional datasets using a 
sequence of piecewise segments known as splines Friedman [43]. The input data has been 
divided further into subgroups with equal intervals for each spline. Knots describe the beginning 
and ending points of these segments [44]. These piecewise curves, known as basis functions 
(BFs), can identify nonlinearities, making this model more adaptable. MARS produced BFs by 
doing a step by step search for all conceivable univariate Knot positions and across all 
interactions between the factors. MARS development has different phases. The model has been 
generated in the forward phase in which BFs and the regression coefficients are constant. In the 
backward phase, MARS eliminates the terms with the lowest efficacy and the over-fitting 
components to improve the generalizability of the created model Zhang et al. [45]. Fig.4 depicts 
the methodology used to model the scour using the MARS approach. 


2.5. ANFIS 


Fuzzy inference system (FIS) and ANN techniques are incorporated in a hybrid scheme known 
as adaptive neuro-fuzzy inference system (ANFIS), according to Jang [46]. Fundamentally, 
ANFIS uses ANN for improving the benefits of FIS membership functions (MFs) which is done 
by adapting a learning process involving two method: back-propagation gradient descent and 
least-squares. Utilizing different algorithms, such as Takagi-sugeno, Mamdai, and Tsukamoto 
fuzzy, ANFIS can be effectively implemented [47]. The four building blocks of fuzzy inference 
systems (FIS), namely the fuzzifier, fuzzy inference engine, Knowledge base, and defuzzifier, 
incorporate the competence of an expert into the system design (Fig. 5) [48]. 
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Fig.4. Flowchart of MARS modelling procedure. 


Input Output 


Crisp Crisp 


Fig. 5. A Flow diagram of a fuzzy inference system (FIS) [48]. 
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A neural fuzzy network which have two inputs, one output and two laws is shown in Fig. 6 [25]. 
Fig. 6 illustrates the five layers that make up the architecture of ANFIS: a fuzzified layer (layer 
1), an implication layer (layer2), a normalizing layer (layer3),a defuzzifying layer (layer4), and 
combined layers (layer5). Nodes in a layer can be of two kinds: while layers 2, 3 and 5 make the 
nodes which are fixed, layers 1 and 4 nodes that are adaptable. To determine the ideal ANFIS 
architecture, a trial-and-error approach using various membership functions with various input 
parameters and shapes, numbers, and types should be used. 


Fig. 6. Architecture of ANFIS [25]. 


2.6. Conventional empirical equations 


Based on previous experimental studies, different empirical equations have been developed for 
estimating scour depth around bridge piers in live-bed conditions. Among these, nine pier 
scouring equations have been selected for this study in order to evaluate the performance of data- 
driven models. Table 1 displays the chosen conventional empirical equations. 
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Table 1 


The existing conventional empirical equations for determination of scour depth in live bed condition. 


Authors Conventional empirical equations Egn nos. 
Laursen and D, D> ve 
Toch [8] oe 1.392) (1) 
Larras [49] D, =1.05D)” (2) 
Breusers [50] D, = LAD. (3) 
0.67 
D o67( D 
Shen et al. [9] 1 — 3 4(Fry°'| 
Y ( ) Y (4) 
yA 
2 
Hancu [51] D, = 2.24 2 LY. —l U. (5) 
D, U. gD, 
uae f ce 2 tanh ae 
D, U. D, 
U U 
— |=0, for— <0.5 
lg Jeorrg: 
Breusers [10] (6) 
U U U 
— |=| 2—-1|}, for0.5 < — <1.0 
U U 
— |=1, for— =1.0 
a lesrrg 
Melville and D, 24D, 
Sutherland [11] Y Y (7) 
D, 
i. = K,K,K,p 
P 
K,= oe if se < landK, = lotherwise 
Melville [52] c c 
K,= 0.57 log(2.24 D, /dso \f D/Ds. < 25andK , = otherwise, and (8) 
Ky =24if D,/Y <0.7, K,y =2,/Y/D, if 0.7 < D, /Y <5, and 
Kp = 4.5Y/D,if D,/Y >5 
Richardsonand D, D>, ie 043 
Davis [5] ue 2. (22) UE (9) 
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2.7. Dataset 


For this research, live bed pier scour data were compiled from studies of Chabert and 
Engeldinger [53], Shen et al. [9], Norman [54], Jain and Fischer [55], Chee [56], Chiew [57], 
Butch [58], Wilson [59], US Geological Survey [60], Sheppard and Miller [61], and Holnbeck 
[62]. Scour of a bridge, according to Kothyari [63], is dependent on the hydraulic parameters 
approach mean velocity (U), the critical velocity of the sediment (Uc), pier diameter (Dp), 
median diameter of the sediment (dso), approach flow depth (y), and Froude number (Fr ). 
Currently 213 datasets have been used for model development. The range of the input datasets is 
displayed in Table 2. About 75% (160 sets) of the total 213 input-output pairs were randomly 
chosen and utilized for the purpose of training, where as the remaining 25% (53 sets) were used 
for testing [64]. 


The characteristics of fluid flow, bed sediments along with pier characteristics that effects the 
scour depth of bridge piers. Scour depth is indicated by the functional relationship Eq. (10): 


D,, = f(U.U...Y, g,Ds.D,. pk, ) (10) 


All pier had a zero angle of alignment with the flow. Circular, square, rectangular, and cylindrical 
pier shapes were all employed in this investigation. 


iG of selected input dataset range for the model development. 
Author Dp/Ds5o Fr UUc Og K, Dw¥ Pc 
Chabert and Engeldinger [53] ees od eat 1.3 1 0.254-1.329 12 
Shen et al. [9] 331.304 | 0.289 | 1.262 | 22 | 09 0.941 1 
Norman [54] pore ed a 3-5.5 | 0.9-1 | 0.194-0.5 2 
Jaindnid Pigeher [55] Pn Rete ne yee 13 1 | 0351-1.811 | 30 
Chee [56] eee ones. denne pe al) Meas aaaS) || ~ 85 
Chiew [57] Gee oeltgsey WN eae till weed, de pomisrgseo.| 82 
Butch [58] Bee ||| ee Wl eae | co OSI |ONhMOEE208 | a 
Wilson [59] ee eae coer Mate. | eMGls [BOOS 7354), ath 
U.S.Geological Survey [60] 56.444 0.452 1.419 2.3 1 0.186 1 
Sheppard and Miller [61] 181.428 on an 13 1 | 0.448-0.989] 11 
Holnbeck [62] evel: ieee WW ceesco la. Nica eee - 


3. Model performance evaluation 


The effectiveness of developed models were evaluated in the current study using a variety of 
statistical performance measures. As listed in Table 3, the following performance indices were 
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used: correlation coefficient (R), root mean squared error (RMSE), mean absolute percentage 
error (MAPE), Nash-Sutcliffe efficiency (E), and index of agreement (Iq). 


Table 3 
List of Statistical performance measures. 


Statistical performance measures : 
Expressions Eqn nos. 


>" (P-P,)x(O-0,) 


Correlation coefficient (R) R= [X'P-PYd0-0," (11) 


Root mean squared error (RMSE) RMSE = 1 : P-O 2 12 
oe!) (12) 
1w4|P-oO| 
Mean absolute percentage error (MAPE) |MAPE = — »S —_§x100 13 
no eS) 


> (0-Py 
y"(0-0, (14) 
ao-P} 


>" (P-0, +|0-90,| 


Nash-Sutcliffe efficiency (E) E=1 


Index of agreement (I,) Fe oa 


y (15) 


Where, O=observed value; P=predicted value; O,= average of the observed value, and P,=mean 
of the predicted value, i is the number of observations. The quality of the relationship between 
the predicted and observed data is expressed by the correlation correlation coefficient (R) .The 
discrepancy between values predicted by a model and the actual values observed is measured by 
the RMSE. MAPE is a measure of model prediction accuracy expressed as relative error. It 
should be noticed that the ideal model has an RMSE of 0 and R of 1, as well as MAPE values of 
lesser than 10% [65]. The ratio of mean square error to potential error is represented by the index 
of agreement(Iq).Ig have ranges lies between 0 stands for no correlation and 1 for perfect fit for 
predicted value and observed value [66] .To evaluate a model's predictive ability, Nash and 
Sutcliffe(1970) [67] suggested the Nash-Sutcliffe efficiency (E). E has a range between -o and 1. 
The more closely the model efficiency approaches 1, the more precise the model. 


3. Results 


3.1. Gamma Test for selection of inputs parameters 


Gamma tests have been performed for all input combinations and the detailed test results are 
shown in Table 4. The input combination which yield the lowest absolute gamma value is 
considered as the ideal combination. There are 2m-1 combinations possible for m scalar inputs, 
but this can result in many irrational input combinations. Because the values of G and V-ratio are 
low (0.047 and 0.306, respectively), which are extremely close to zero in comparison to other 
blends, Table 4 demonstrates that the combination of five parameters with mask (11111) can 
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provide an appropriate model in contrast to other possible blends. For the present study, a 
combination of five non-dimensional parameters were used to develop models. 


Table 4 
Determining the best combination for scour depth modelling. 
SLNo. Input Combination Gamma Standard Error V-ratio Mask 
1 DJ/D,, Fr,U/U o, ‘ K 0.047 0.024 0.306 11111 
2 DJ/D,, Fr,U/U cA 0.047 0.024 0.307 11110 
3 D/D,, FrU/U, 0.056 0.022 0.365 11100 
4 DJ/D,, tr 0.057 0.027 0.370 11000 
5 PLUM. 9. S 0.075 0.006 0.487 01111 
6 DJ/D,, 0.144 0.029 0.934 10000 
7 D/D,, UU 0 K. 0.048 0.023 0.314 10111 
8 Bas K 0.143 0.018 0.929 10011 
9 DJ/D S K 0.146 0.026 0.949 10001 


The gamma test findings has been used to create the model for this study, which uses five non- 
dimensional parameters as input and the ratio of scour depth to flow depth as output, listed in Eq. 
(16). The final non -dimensional function is as follows, and it depicts how the variables effects 
the scour depth at a bridge pier in live-bed condition: 


D D 
tas Fro, 


y Ds’ °U 


Cc 


(16) 
DJ/D, » fepresents the sediment coarseness, Fr is the Froude number, U/U, is flow intensity, og is 


the sediment gradation coefficient of the bed material, and K. is the shape factor. 


Fig.7 depicts non-dimensional relationship between the selected input and desired output. It 
summaries the correlation matrix plots, which use Person's correlation coefficient to explain the 
linear association of the aforementioned variables. The Person correlation value ranges from [1 
to -1], where a positive value of one depicts direct-proportionality of input and output variables 
to one another and vice-versa. 


> ek =x -Y,) 
(Xi, &- X,Y -Y,) es 


Where X is the value of predictors, Y is the value of target, X, is the average value of predictors, 
and Y, is the average value target.According to the Fig.7, Fr (t,=0.61) have the highest linear 
impact on the scour depth ratio (Ds/Y) followed by U/Uc (1)=0.45), D/D, ‘ (rp=0.29), and K. 


(rp=0.029), respectively. 
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Correlation Matrix 


Dp/ d50 -0.045 0.23 
0.8 
0.6 
Fr 0.0077 
0.4 
U/ Uc co — 0.2 
0 
og -0.20 0.20 02 
-0.4 
Ks 0.0077 0.046 
-0.6 
Dsl/Y oa 08s. 7 
-1 
S e ve é a $s 
A S Ss 
9 


Fig. 7. The plot of the correlation matrix between the input and output variable. 


3.2. Modelling with GEP 


For the GEP formulation, the laboratory and field data set from the current study have been 
considered. To begin the formulation, entire data set has been split into the training set (75%) and 
the testing set (25%). The parameters and procedures contained in GEP were subsequently 
established in five steps to enable the development of the mathematical equation which was 
required for estimation of scour depth. In first step several tests have been performed to 
determine the ideal population size, and ultimately the population size to be used in the study 
was taken 30 as this population size produce the most optimum results. The fitness function of an 
individual chromosome, as determined by RMSE, was measured in the subsequent step. In the 
third step, the terminal and set of functions for constructing chromosomes for the current study 
have been selected and listed in Table 5. In this study, gene number is selected as three and head 
size as eight in fourth step. In the fifth step, genetic operators were selected for making 
allowances for variation in both type and rate of expression through the final equation of these 
sub-expression trees (sub-ETs), which is linked by addition (+). There were three genes per 
chromosome. The software programme Gene Xpro Tools 5.0 was used after specifying all the 
necessary parameters. An explicit and concise mathematical equation for estimating the depth of 
scour around bridge piers is provided by this software. Table 5 shows the most optimum genetic 
operator values among all the tried genetic operators. The resultant scour depth formula is shown 
as an expression tree (ET) in Fig. 8 while the associated equation is written as Eq. (18). The 
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values of constant shown in Eq. (18) are G1C7=0.968, G1C4=-3.686, G2=-8.303, G3C9=3.529 
and G3C4=10.775. Fig.8 show the ETs for GEP model in which d0= D/D,.» dl= Fr, d2= 


U/U,,d3= 6, and d4= K.. 


Table 5 
Parameters of the optimized GEP model. 
Serial number Description of parameter Parameter setting 
1 Chromosomes 30 
2 Genes 3 
3 Head size 8 
4 Number of generation 150979 
5 Mutation rate 0.044 
6 Inversion rate 0.1 
7 Function set +,-,*,/,power,Exp,Ln 
8 One point recombination rate 0.3 
9 Two- point recombination rate 0.3 
10 Gene recombination rate 0.1 
11 Gene transposition rate 0.1 
12 Linking function Addition 
13 Program Size 38 
14 Fitness function RMSE 


The explicit equation derived from Sub-ETs to create the GEP model may be written as: 


UU 
3.69 F, +3.69 _ 
Dy _ Uc Uc" 0.12, (2.38%K28)%* si) 
7 De 0.97+F, (K,-8.30)F, U 
Dso Us 


3.3. Modeling with M5-tree 


MS5-TREE modeling has been carried out using the Waikato Environment for Knowledge 
Analysis, or WEKA [68], a well-known software of machine learning tools developed at the 
University of Waikato. The M5- TREE is a tree based regression technique that requires only one 
parameter, the minimum number of training instances allowed at a terminal node, to be chosen 
for a specific dataset. Six training instances were found to work best with this input data after 
several tests. Using the M5 modeling approach, Table 6 shows the dataset's correlation 
coefficient and RMSE values. The availability of four simple linear relations (Eq. (19), (20), 
(21), and (22)), makes it easy to estimate pier scour using laboratory and field data, which is one 
of important advantages of the M5-TREE. The MS-TREE (8), produced the best outcome among 
all the tried M5-TREE models, as shown in Fig. 9. The use of linear models for various input 
parameters have been shown in Fig 9. 
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Sub-ET 1 


Sub-ET 2 


Sub-ET 3 


Fig. 8. Expression-Tree of GEP. 


Table 6 
Parameters variation with M5-TREE Model. 

Model Instances Number | Percentage split | Number of rules | Correlation coefficient | RMSE 
M5-TREE(1) 1 10 1 0.613 0.329 
M5-TREE(2) 1 20 1 0.851 0.217 
M5-TREE(3) 1 30 1 0.846 0.216 
M5-TREE(4) 1 40 1 0.777 0.245 
M5-TREE(5) 1 50 1 0.768 0.257 
M5-TREE(6) 1 25 1 0.856 0.219 
M5-TREE(7) 5 25 1 0.856 0.219 
M5-TREE(8) 6 25 4 0.856 0.219 
M5-TREE(9) 15 25 4 0.779 0.206 
M5-TREE(10) 20 25 1 0.779 0.206 
M5-TREE(11) 25 25 1 0.755 0.293 
M5-TREE(12) 5 20 1 0.851 0.217 
M5-TREE(13) 6 20 4 0.851 0.217 
M5-TREE(14) 10 20 4 0.852 0.216 
M5-TREE(15) 15 20 4 0.812 0.246 
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LM 4 (46/85.246%) 


LM 3 (67/31.982%) 


>54.674 


LM 1 (27/44.862%) | LM 2 (73/37.802%) 


Fig. 9. Tree representation of the best model (M5-TREE (8) in Table 6). 


ConditionsLM 1: 
D1 <= 0.741, D3 <=1.95, DO <= 54.674 


D 
LM: Dy = 0.001 * —* + 0.4697 * F. — 0.0925 * oo 0.1784*o, +0.5049 
ConditionsLM 2: 
D1 <=0.741, D3 <=1.95, DO > 54.674 


D 
LM 2: =a = 0.0018 *« — + 2.0185* F, -0.43394— -0.8123%0, +1.269 (20) 
50 c 


ConditionsLM 3: 
D1 <=0.741, D3 > 1.95 


D 
IM 3: ae = 0.001 1* —* —2.888* F. +0.1554* 0.04870, + 0.244 (21) 


50 a 
ConditionsLM 4: 
D1 > 0.741 


D 
IM 4: “! = 0.0011* —* + 0.6772 * F. —0.0297 * = —0.0268* oa, +0.3758 (22) 


50 c 


Where, DO= Sediment Coarseness ratios, D1=Froude number and D3= gradation coefficient of 
the bed material. 
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3.4. Modeling with MARS 


The adaptive regression splines (ARES) lab toolbox (version 1.13.0) in MATLAB were used to 
construct the MARS model in the present work. The type of function between the inputs and 
outputs does not have to be established a priori by MARS [69]. The number of basis functions in 
the forward step (BF (F)) and the degree of interaction (DOI) influences the MARS model 
performance [70]. In this study, the ranges for (BF (F)) and (DOJ) were | to 50 and 0 to 4, 
respectively. Finally, we select the (BF (F)) and (DOI) values that produced the most accurate 
estimate of equilibrium scour depth. Compared to models with different DOI values, the MARS 
model with a DOI of 4 performed well for the present dataset. The MARS model is developed in 
the second phase, as was previously explained. Fifty basis functions were used for MARS 
development in the forward phase, and in the backward phase, it has been pruned to 2 basis 
functions. The best MARS equation is developed with 37 basis functions. Generalized cross- 
validation (GCV), which is based on the process of forward selection and backward deletion 
process, is used to develop the final MARS models [44]. Table-7 provides the variations of the 
different parameter for each MARS models.MARS-10 provided the most optimum values of 
different among all the MARS models used in the current dataset. 


Table 7 
Parameter variation with MARS model. 

Model | Max-BF | Max-IN | BF(F) | BF(B) | BF(M) | DOI | NVM R RMSE 
MARS-1 1 5 1 0 1 0 0 1.34E-31 0.388 
MARS-2 5 5 5 0 5 1 2 0.596 0.246 
MARS-3 10 5 9 1 8 2 4 0.763 0.188 
MARS-4 15 5 15 3 12 3 4 0.890 0.128 
MARS-5 20 5 18 2 15 3 5 0.909 0.117 
MARS-6 25 5 24 1 18 3 5 0.927 0.104 
MARS-7 30 5 28 2 22 3 5 0.940 0.094 
MARS-8 35 5 34 1 25 3 5) 0.948 0.088 
MARS-9 40 5 38 2 33 4 5 0.955 0.082 

MARS-10 50 5 48 2 37 4 5 0.957 0.080 


Where, Max-BF=Maximum Basis function, Max-IN=Maximum number of interaction, BF (F)= 
Basis function in forward step, BF (B)= Basis function in backward,BF (M)=Basis function in 
final model, DOI=Degree of interaction, NVM=Number of variables used in model, R= 
Correlation coefficient and (RMSE)=Root mean square error. 


3.5. Modeling with ANFIS 


In this study, the ANFIS model has been developed in relation with chosen input parameters for 
estimating scour depth. The ANFIS model was developed in MATLAB using a variety of 
membership functions (MFs), including the triangular membership function (trimf), trapezoidal 
membership function (trapmf), generalised bell-shaped membership function (Gbellmf), gaussian 
membership function (gaussmf), and gaussian combination membership function (gauss2mf). 
There are two options for running a fuzzy model: subtractive fuzzy clustering (which requires 
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less computational effort) and grid partitioning (requiring more computational effort).To find the 
best ANFIS model we have run the MATLAB with different type MFs for each input paramter 
and tried different combinations on a number of members functions from 2 to 3.In contrast to 
other MFs, the grid partition (GP) technique of grid generation with the triangular-shaped MF 
(trimf) has a best performance during the development of the ANFIS model. Table 8 provides the 
specifics of the configuration of best ANFIS model for prediction of bridge pier scour depth. 


Table 8 
Parameters of the best ANFIS model. 
Serial number Architecture of ANFIS Parameter setting 
1 Number of membership function 33333 
2 Algorithm selected Hybrid 
3 Number of epoch runs given 100 
4 Generated fuzzy inference system Grid partition 
5 Membership function(MF) type linear 
6 Type of membership function(MF)used trimf 
7 Number of nodes 524 
8 Number of linear parameters 1458 
9 Number of nonlinear parameters 45 
10 Total number of parameters 1503 
11 Number of fuzzy rules 243 


3.6. Performance evaluation of data-driven models with conventional empirical equations 


Here,we have used nine conventional empirical equations to estimate scour depth in a bridge pier 
under live-bed conditions.In addition to this, results of the four data-driven models of the four 
data-driven models used for the same purpose of bridge pier scour estimation have been 
compared with the results of these selected nine conventional equations for evaluation of the 
effectiveness of suggested model (Table 9). The comparison shows that the ANFIS model 
outperformed all four data-driven models and other empirical equations. The highest and lowest 
correlation coefficients were achieved by the Larras [49] and Hancu [51] equations as 0.913 and 
0.717, respectively. Also, the highest and lowest RMSE values for Shen et al. [9] and Breusers 
[50] equation were 0.919 and 0.244, respectively. For the equation Shen et al. [9], the maximum 
value of MAPE =168.54 is achieved. The MAPE parameter in Breusers [10] equation has a 
minimum value of 44.525. The maximum and lowest values of E, 1.503 and 0.371, were found 
by the Shen et al. [9] equation and Larras [49], respectively. The highest and minimum values of 
the Index of Agreement (Id) are 0.813 and 0.521 for the Larras [49] equation. 


4. Discussion 


4.1. Graphical Analysis of Models 


The exactness of the proposed data-driven models in predicting the depth of scour around piers 
of a bridge in live-bed conditions have been analysed using graphical analysis. Error distribution 
of four data-driven models in the shape of violin plots, have been demonstrated in Fig.10. From 
Fig.10, the maximum and minimum relative deviation of the GEP model are 65.55 and 53.06, for 
the M5-Tree model 57.38 and -111.19, for MARS 26.29 and -42.15, ANFIS 24.57 and -48.70, 
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respectively. These results shows, the distribution of errors in the ANFIS model is better than the 
other three data-driven models. Furthermore, it is evident that the MARS model resembles the 
ANFIS model and is similarly close to it (Fig. 10). Therefore, the MARS model can be regarded 
as the best model in predicting scour depth after the ANFIS model. Which also performs 
significantly better than the GEP and M5-Tree model. 


100 


nn 
= = 


ch 
= 


Relative Deviation(%) 


—100 


—-150 
GEP MS5-TREE MARS ANFIS 


Fig. 10.Violin plot of error distribution for data-driven model. 


Fig.11 illustrates the scatter plots between observed versus predicted scour depth ratio in live-bed 
conditions for all data and models in the present research. From Fig.11, the value of the 
coefficient of determination (R’) for the ANFIS model equals 0.9741, the highest among all four 
data-driven models and nine conventional equations. The MARS model also performed well 
quantitatively value of R’ is equal to 0.9573, which was lower after the ANFIS model, and the 
MARS model under-predicted more scour depth ratio measurement compared to the ANFIS 
model. GEP and M5-Tree models have a value of R” equal to 0.7855 and 0.6997, respectively, 
which is lesser than the value of R* for ANFIS and MARS. It was observed that the conventional 
equation, [5,8,9,11,49,52] over-predict the scour depth ratio, which would result in overdesigned 
bridge foundations. However, [10,50,51] equations contain a mix of under-prediction and over- 
prediction values. From fig.11, maximum and minimum over prediction occurs in Shen et al. [9] 
and Breusers [10], having R* equal to 0.7596 and 0.6523, respectively. Fig.11 showed that the 
Larras [49] equation performed quantitatively well as the coefficient of determination value is 
0.8339, the highest among the nine conventional equations used in the present study. However, it 
was observed that predicted values of Larras [49] equation shows slightly variance owing to the 
fact it is based solely on pier characterstics and is not sensitive to hydraulic or sediment factors. 
None among of the conventional empirical equations estimated the scour depth consistently as 
done by the data-driven models for the live-bed condition, as illustrated in Fig.11. It can be seen 
that ANFIS model results are closest to the best-fit line and thus, indicates better accuracy. So, 
the ANFIS model effectively adjusts to the complex non-linear relationship between the 
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parameters and hence it can be adopted for prediction of scour, which has to be considered in 
design of hydraulic structures. 
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Fig. 11. Scatter plots of observed versus predicted scour depth ratio in live-bed condition for all data and 
models in present research. 


4.2. Comparison of the Data-driven models with existing conventional empirical 
equations 


In this section the data-driven models of GEP, M5-Tree, MARS, and ANFIS have been used to 
predict bridge scour depth using non-dimensional dataset configurations in live-bed conditions. 
Table 9 indicates the statistical results obtained from all data-driven models and previous models. 
The results of these calculation clearly demonstrates that ANFIS is the best model, with R 
=0.986, RMSE = 0.062, MAPE = 6.767, E = 0.975, and Iy = 0.9831 for the overall dataset, 
followed by MARS and GEP. 
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The statistical indices R, RMSE, MAPE, E, and Ig, have been calculated for all the developed 
models and the empirical expressions and the results are listed in Table 9. Additionally, the 
performance index (PIm), a single multi-index criterion, is used for precise validation [42]. 
PI, =1/3 R vin " RMSE,, ie MAPE, 

R,, RMSE, MAPE yx 


(23) 
Each predictive scour depth model prediction for the live bed is indicated by the subscript "m." 
The statistical performance criteria is described in Table 9 which highlights that all data-driven 
GEP (PI n=0.417), M5-Tree (PIn=0.438), MARS (PI,=0.303), and ANFIS (PI,=0.278) models 
are more precise than selected existing models in evaluating the scour depth in live-bed 
conditions. However, the best performing among all models especially with respect to 
relationships for the existing database is Breusers [50], which is ranked fifth and has the 
lowest(RMSE=0.244 and PIm=0.474) and highest(E=0.624) values. 


Shen et al. [9] also have the worst statistical performance indices, with RMSE=0.919, 
MAPE=168.54, E=-4.292, and PIn=0.940, ranking thirteen as the least accurate in evaluating 


scour depth in live bed conditions when taking data sets into account. 


Boece of data driven model and conventional empirical equation. 

MODEL R RMSE | MAPE E Ta Pla Rank 

GEP 0.886 0.183 40.885 0.79 0.863 0.417 3 

M5-TREE 0.836 0.215 37.903 0.711 0.818 0.438 4 

MARS 0.978 0.08 15.225 0.958 0.978 0.303 2 

ANFIS 0.986 0.062 6.767 0.975 0.987 0.278 1 

Laursen and Toch [8] 0.812 0.266 66.533 0.554 0.793 0.522 8 

Larras [49] 0.913 0.316 55.364 0.371 0.831 0.486 7 

Breusers [50] 0.807 0.244 45.463 0.624 0.815 0.474 ) 

Shen et al. [9] 0.871 0.919 168.54 4.292 0.561 0.940 13 

Hancu [51] 0.717 0.277 54.288 0.515 0.686 0.541 

Breusers [10] 0.807 0.279 44.525 0.509 0.521 0.485 6 

Melville and Sutherland [11] 0.807 0.632 112.97 1.503 0.641 0.748 12 

Melville [52] 0.812 0.517 104.18 0.678 0.685 0.687 11 

Richardson and Davis [5] 0.877 0.421 98.181 0.107 0.761 0.619 10 


It is noteworthy that each prior relationship is present for specific conditions, including variable 
characteristics of the fluid, flow, bed sediment, and pier type. These comparisons based on a 
particular data set do not establish the inability of the conventional empirical equations. 
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5. Conclusions 


To aid in design of a cost effective and secure bridge that could even withstand during flood 
situation by proper estimation of scour depth around the piers of the bridge. This research uses 
four data-driven modeling approaches: GEP, M5-TREE, MARS, and ANFIS to evaluate scour 
depth in live-bed conditions. A total of 213 different datasets from various field studies and 
laboratory experiments published in the literature are collected to generate a new model. The 
model created utilizing above mention dataset provides a better model for variable input 
parameters resulting from the ever changing dynamic situation in live-bed conditions. Gamma 
test results reveal a combination of five dimensionless input parameters that includes sediment 
Coarseness Ratio (D,/D, o)? Froude Number (Fr), Flow Intensity (U/U,), Gradation Coefficient of 


the Bed material (o,), and shape factor (K.) are factors on which scour depth variation depends 


during live-bed conditions. When the results of these four data-driven model were compared, it 
was found that ANFIS performed the best, followed by MARS, M5-TREE and GEP models 
respectively. Then these data-driven models were compared with nine conventional empirical 
equations. The findings showed that the ANFIS outperformed other models, with Breusers [50] 
and Shen et al. [9] having the lowest and highest errors in scour depth prediction. Out of all the 
models chosen for the current database, the ANFIS model with PI,,=0.417 ranked first. As shown 
by the current study's findings, ANFIS has a high capacity for applicability and practicability in 
predicting scour depth in live-bed conditions around a bridge pier. With similar conditions and a 
wide variety of input parameters, this can be used effectively for pertinent tasks. 
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